Skip to content

policies: treat SERIAL/LOCAL_SERIAL consistency as LWT for routing#887

Open
mykaul wants to merge 2 commits into
scylladb:masterfrom
mykaul:fix-serial-consistency-lwt-routing
Open

policies: treat SERIAL/LOCAL_SERIAL consistency as LWT for routing#887
mykaul wants to merge 2 commits into
scylladb:masterfrom
mykaul:fix-serial-consistency-lwt-routing

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented May 13, 2026

Summary

  • TokenAwarePolicy now skips replica shuffling for statements with SERIAL or LOCAL_SERIAL consistency level, treating them as LWT-routing candidates
  • Previously only is_lwt() (from server prepare metadata) was checked, meaning serial-consistency reads got shuffled replicas instead of deterministic Paxos-optimal ordering
  • Added unit test validating no-shuffle behavior for both vnode and tablet clusters with serial consistency

Fixes: #886

Related: scylladb/java-driver#885

Statements with SERIAL or LOCAL_SERIAL consistency level are serialized
through the Paxos path on the server, but TokenAwarePolicy only checked
is_lwt() (from server prepare metadata) when deciding whether to skip
replica shuffling. This meant serial-consistency reads could be routed
with shuffled replicas instead of the deterministic order needed for
optimal Paxos coordination.

Now TokenAwarePolicy also checks the statement's consistency level and
skips shuffling for SERIAL/LOCAL_SERIAL, matching LWT routing behavior.

Fixes: scylladb#886
@mykaul mykaul force-pushed the fix-serial-consistency-lwt-routing branch from 76503d0 to c14edfa Compare May 19, 2026 07:44
@sylwiaszunejko sylwiaszunejko requested a review from Copilot May 20, 2026 07:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates TokenAwarePolicy routing behavior so statements executed with SERIAL / LOCAL_SERIAL consistency are treated like LWT-routing candidates (replica order preserved), matching expected Paxos/serial-path coordinator selection and fixing the replica-shuffle behavior described in #886.

Changes:

  • Skip replica shuffling in TokenAwarePolicy when query.consistency_level is SERIAL or LOCAL_SERIAL.
  • Add a unit test ensuring shuffle() is not called for serial-consistency statements on both vnode and tablet-style metadata setups.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
cassandra/policies.py Adjusts shuffle condition to preserve replica order for serial-consistency statements (in addition to LWT).
tests/unit/test_policies.py Adds regression test asserting shuffle() is not invoked for SERIAL / LOCAL_SERIAL statements.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cassandra/policies.py
Copy link
Copy Markdown
Collaborator

@sylwiaszunejko sylwiaszunejko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, @dkropachev could you also take a look?

Copy link
Copy Markdown
Collaborator

@sylwiaszunejko sylwiaszunejko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that this part is not addressed:
when a statement is executed with SERIAL or LOCAL_SERIAL as regular
consistency (serial/Paxos read), retry policies must not downgrade it
to a non-serial consistency level like ONE or QUORUM, as that would
break the serial read guarantees.

@mykaul
Copy link
Copy Markdown
Author

mykaul commented May 20, 2026

I missed that this part is not addressed: when a statement is executed with SERIAL or LOCAL_SERIAL as regular consistency (serial/Paxos read), retry policies must not downgrade it to a non-serial consistency level like ONE or QUORUM, as that would break the serial read guarantees.

Added.

Add a guard in the retry execution path that prevents any retry policy
from downgrading SERIAL/LOCAL_SERIAL to a non-serial consistency level,
which would break serial read (Paxos) guarantees.

Also add a unit test verifying DowngradingConsistencyRetryPolicy does
not downgrade serial consistency on read timeout or unavailable.
@mykaul mykaul force-pushed the fix-serial-consistency-lwt-routing branch from 7347fe3 to 3b69e2a Compare May 20, 2026 13:00
Copy link
Copy Markdown
Collaborator

@dkropachev dkropachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR, but you forgot to adjust built-in retry policies to not even try to downgrade.

Comment thread cassandra/cluster.py
Comment on lines +5438 to +5449
if not ConsistencyLevel.is_serial(consistency_level):
original_cl = self.message.consistency_level
if ConsistencyLevel.is_serial(original_cl):
log.debug(
"Retry policy attempted to downgrade serial consistency %s to %s; "
"keeping original consistency level.",
ConsistencyLevel.value_to_name.get(original_cl, original_cl),
ConsistencyLevel.value_to_name.get(consistency_level, consistency_level))
else:
self.message.consistency_level = consistency_level
else:
self.message.consistency_level = consistency_level
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too over complicated, can you make it a single if-else please

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Actually AInhad a single if and I thought it be better to split.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TokenAwarePolicy does not route SERIAL/LOCAL_SERIAL statements as LWT requests

4 participants